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1 Introduction 



Adaptive filtering algorithms have been studied extensively, thanks to their simple recur- 
sive forms and wide applicability for diversified practical problems arising in estimation, 
identification, adaptive control, and signal processing [26]. 

Recent rapid advancement in science and technology has introduced many emerging ap- 
plications in which adaptive filtering is of substantial utility, including consensus controls, 
networked systems, and wireless communications; see [1, 2, 4, 5, 8, 7, 12, 13, 14, 16, 17, 18, 
19, 20, 23, 24, 27]. One typical scenario of such new domains of applications is that the 
underlying systems are inherently time varying and their parameter variations are stochastic 
[29, 30, 31]. One important class of such stochastic systems involves systems whose randomly 
time-varying parameters can be described by Markov chains. For example, networked sys- 
tems include communication channels as part of the system topology. Channel connections, 
interruptions, data transmission queuing and routing, packet delays and losses, are always 
random. Markov chain models become a natural choice for such systems. For control strat- 
egy adaptation and performance optimization, it is essential to capture time- varying system 
parameters during their operations, which lead to the problems of identifying Markovian 
regime-switching systems pursued in this paper. 

When data acquisition, signal processing, algorithm implementation are subject to re- 
source limitations, it is highly desirable to reduce data complexity. This is especially im- 
portant when data shuffling involves communication networks. This understanding has mo- 
tivated the main theme of this paper by using sign-error updating schemes, which carry 
much reduced data complexity, in adaptive filtering algorithms, without detrimental effects 
on parameter estimation accuracy and convergence rates. 

In our recent work, we developed a sign-regressor algorithm for adaptive filters [28]. The 
current paper further develops sign-error adaptive filtering algorithms. It is well-known that 
sign algorithms have the advantage of reduced computational complexity. The sign operator 
reduces the implementation of the algorithms to bits in data communications and simple bit 
shifts in multiplications. As such, sign algorithms are highly appealing for practical appli- 
cations. The work [11] introduced sign algorithms and has inspired much of the subsequent 
developments in the field. On the other hand, employing sign operators in adaptive algo- 
rithms has introduced substantial challenges in establishing convergence properties and error 
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bounds. 



A distinctive feature of the algorithms introduced in this paper is the multi-time-scale 
framework for characterizing parameter variations and algorithm updating speeds. This is 
realized by considering the stepsize of the estimation algorithms and a scaling parameter 
that defines the transition rates of the Markov jump process. Depending on the relative 
time scales of these two processes, suitably scaled sequences of the estimates are shown to 
converge to either an ordinary differential equation, or a set of ordinary differential equations 
modulated by random switching, or a stochastic differential equation, or stochastic differen- 
tial equations with random switching. Using weak convergence methods, convergence and 
rates of convergence of the algorithms are obtained for all these cases. 

The rest of the paper is arranged as follows. Section 2 formulates the problems and 
introduces the two-time-scale framework. The main algorithms are presented in Section 3. 
Mean-squares errors on parameter estimators are derived. By taking appropriate continuous- 
time interpolations. Section 4 establishes convergence properties of interpolated sequences of 
estimates from the adaptive filtering algorithms. Our analysis is based on weak convergence 
methods. The convergence properties are obtained by using martingale averaging techniques. 
Section 5 further investigates the rates of convergence. Suitably interpolated sequences are 
shown to converge to either stochastic differential equations or randomly-switched stochastic 
differential equations, depending on relations between the two time scales. Numerical results 
by simulation are presented to demonstrate the performance of our algorithms in Section 6. 



where G W is the sequence of regression vectors, e„ G M is a sequence of zero mean 
random variables representing the error or noise, a„, G W is the time-varying true parameter 
process, and y„ G M is the sequence of observation signals at time n. 

Estimates of a„ are denoted by 6'„ and are given by the following adaptive filtering 
algorithm using a sign operator on the prediction error 



2 Problem Formulation 



Let 



Vn = 



ip'^a-n + en, n = 0, 1, . . . , 



(1) 



n+l — 



On + HipnSgn{yn - (fi'nOn) 



(2) 
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where sgn(?/) is defined as sgn(?/) := l{y>o} — l{y<o} for y G M}. We impose tlie following 
assumptions. 

(Al) a„ is a discrete-time homogeneous Markov chain with state space 

M = {ai, ttmo}, e M^ 2 = 1, . . . , mo, (3) 

and whose transition probability matrix is given by 

P' = I + eQ, (4) 

where e > is a small parameter, I is the M™o^™o identity matrix, and Q = {qij) G 
j^moxmo jg g^j^ irreducible generator (i.e., Q satisfies qij > for i ^ j and YlY=i% ~ 
for each i = 1, . . . , mg) of a continuous-time Markov chain. For simplicity, assume that 
the initial distribution of the Markov chain a„ is given by P(ao = Oj) = Po,i, which is 
independent of e for each i = 1, . . . , mg, where po,j > and Yl^i Po,i = 1- 

(A2) The sequence of signals e„)} is uniformly bounded, stationary, and independent 
of the parameter process {«„}. Let J^n be the a-algebra generated by {{(pj,ej),aj : 
j < n; an}, and denote the conditional expectation with respect to by En- 

(A3) For each i = 1, . . . , mo, define 

gn := ipnSgn.{(p'n[an - On] + en) 

gn{0, i) := <^„sgn((^;[ai -6] + e„)/{„„=a.} (5) 
gn{0,i) := Engn{0,i) 

For each n and i, there is an An^ ^W^^ such that given a„ = Oj, 

gn{9, i) = A^^\a^ - ^)/{„„=aj + o{\a^ - ^|/{a„=aa) 



(6) 



(A4) There is a sequence of non-negative real numbers {(f){k)} with < ^ such 

that for each n and each j > n, and for some K > 0, 



\EnAf -A(^^\<K<p'/\j-n) (7) 



uniformly in z = 1, . . . , mo. 
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Remark 2.1 Let us take a moment to justify the practicality of the assumptions. The 
boundedness assumption in (A2) is fairly mild. For example, we may use a truncated Gaus- 
sian process. In addition, it is possible to accommodate unbounded signals by treating 
martingale difference sequences (which make the proofs slightly simpler). 

In (A3), we consider that while 5'„,(^, i) is not smooth w.r.t. 6, its conditional expectation 
Qnidi'i) can be a smooth function of 6. The condition (6) indicates that gn{6,i) is locally 
(near Oj) linearizable. For example, this is satisfied if the conditional joint density of (</)„, e„) 
with respect to {(fj, ej,j < n, is differentiable with bounded derivatives; see [6] for more 
discussion. Finally, (A4) is essentially a mixing condition which indicates that the remote 
past and distant future are asymptotically independent. Hence we may work with correlated 
signals as long as the correlation decays sufficiently quickly between iterates. 

3 Mean Squares Error Bounds 

Denote the sequence of estimation errors by 6'„ := a,„ — 6*^. We proceed to obtain bounds for 
the mean squares error in terms of the transition rate of the parameter e and the adaptation 
rate of the algorithm /i. 

Theorem 3.1 Assume (Al)-(A4). Then there is an > such that for all n > N^, 

E\en\^ = E\an - Onl^ = O {fX + € + 6^ fx) . (8) 

Proof. Define a function by V{x) = {x'x)/2. Observe that 

9n+l = On+l - On+l = On - Ai<y5„Sgn((/?^6'„ + e„) + {an+i - an) (9) 



(10) 



so 

EnV{6n+l) - V{6n)= En6n[{an+1 - «„) " /iV5nSgn((^^?'„ + Cn)] 

+ En\{an+i - an) - fiipnSgn{ip'^9n + e„)|^ 
By (A2), the Markov chain a„ is independent of (</?„, e„) and I{an=ai} is J-'^-measurable. 
Since the transition matrix is of the form = I + eQ, we obtain 



mo 



En{an+1 — an)= ^ -E(an+1 — ttn = ai)I{an=ai} 

(11) 

mo mo \ / 

= ^[Y1 ^^^^^^ + ^^^^'^ ~ ^{«n=aj = 0{e) 



1=1 j=l 
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Similarly, 

mp m.Q 

En\an+1 — «nP= ^ ^ l^j — p/{Q^=a.}P(an+l = ajlOn = Oj) 



j=l i=l 
mo mo 



0(5) 



Applying (14) to (10), we arrive at 

Er,V{9r,+i) - V(9n) = -/iE„^>„sgn((^^^„ + e„) + Ej'^{an+i - a, 

+ En\an+1 - «nP + 0{fl^ + /i5)[y(^„) + 1] 

Note also that by (A3), 



(12) 



Note that |^„| = ■ 1 < (|^„|2 + l)/2, so 

0{e)\9^\<0{e){V(9n) + l). (13) 

Since the signals {(</?„, e„)} are bounded, we have 

En\{an+i - an) - fiipnSgn{(p'^9n + e„)|^ 
= ^„|a„+i - + 0(/i^ + /ie)[y(^„) + 1) 



(14) 



(15) 



fiEJ'^cp^sgn{ip'Jn + en)= E„e>„sgn((^^^„ + e„)/{c,„=a,} 

mp 
i=l 

mp mo 
i=l i=l 

(16) 

To treat the first three terms in (15), we define the following perturbed Liapunov functions 

by 

oo mo 

j=n 1=1 
oo 

V,^(9, n) := 0'En{a,+i - a,) (17) 

j=n 

oo 



j=n 
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By virtue of (A4), we have 

mo oo 

\vr{e,n)\ < f,Y,m'J2<P'^\j -^)< o{^Ji)[v{e) + 1] (is) 

Note also that the irreducibihty of Q imphes that of I + sQ for sufficiently small e > 0. Thus 
there is an such that for all n > A^^, \{I + eQ)'' — lu^l < for some < Ac < 1, where z/^ 
denotes the stationary distribution associated with the transition matrix I + eQ. Note that 
the difference of the j + 1 — n and j — n step transition matrices is given by 

(/ + £Qy+i-"-(/ + £gy-" 

= [{I + eQ) - + sQy-'' 

= [{I + eQ) - /][(/ + eQ)^-^ - M] + [(/ + eQ) - Ijlu, 
= ieQ)[iI + eQ)^~^ - M]. 

The last line above follows from the fact Ql = 0, hence [(/ + eQ) — I]ii'e = 0- Thus 

oo oo 

^ |/ + £Q)^+i-" - (/ + £Q)^-"| < 0{e) J2 = 0{e). (19) 

j=n j=n 

The forgoing estimates lead to ^^„-En(«j+i ~ «j) = 0{e) and as a result 

\V,^{ln)\<Oie){Vie) + l). (20) 

and similarly 

\V>:{n)\=0{e), (21) 

so all the perturbations can be made small. 
Now, we note that 

EnVl'{en+un + l)-Vl'{en,n) 

= Er^VnOn+un + l) - + 1) + + 1) - Kr(^.,n). 



(22) 



where 



and 



mo 



Er^VnOn, n + 1)- VnOn, u) = flj^ [4^^ ' A^'W{a.=a,} (23) 



i=l 



E^Vl^iOr^+un + 1) - Er.Vl'ien, U + 1) 
oo mo 

= /i 5^ J^E^dn+i-e^yiAf -A^'%^^i{^^=,^} 

j=n+l i=l 

oo mo 

+ E Y.^MAf-A(^Wn+l-en)Iia„=a.}. 
j=n+l i=l 



(24) 
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Using (11), we have 



En\0n+1 -On\< £^n|an+l " «n| + /i£'n | V5nSgn((^'„6'„ + e„) | = O (^ + /i) . (25) 



Thus, in view of (A4) 



oo mo 



j=n+l i=l 



<0{fi' + fie)[V{9r,) + l], (26) 



and 



^ ^-E'n(6'„,+ l — 6'„)'-E'„+l[v4j''' — A^''^]6n+ll{an=ai} 
j=n+l i=l 

Putting together (22)-(27), we estabhsh that 



<0{fi^ + fie)[V{9r,) + l]. (27) 



mo 



E^Vl'{9n+,,n + 1) - Vr{9^,n) = fiJ^^M'^f - + 0{fi^ + fie)[V{9n) + 1]. 

1=1 

(28) 



Likewise, we can obtain 



EnV^^{9^+i, n + 1)- V^{9n, n) = -EMa^+i - a^) + 0{e^ + fi^) 



(29) 



and 



Now we define 



E^Vi'in + 1) - Vi'in) = - «„P + 0{e' 



W{9, n) = V{9) + ¥^{9, n) + ¥^^{9, n) + ^3^(72). 



(30) 



Since each A^*) is a stable matrix there is a A > such that 9'A^'^9 > \V{9) for each i. Thus 
we may take A such that — /i Yl^i ^'^''*''^-^{a,i=ai} — liO{9) < —\iiV{9). Using this along with 
(10), (16), (28)-(30), and the inequality 0{^e) = 0(/i^ + 5^), we arrive at 



Er,W{9^+^,n + l)-W{9r,,n) 

mp ^ ^ ^ ^ 

= - fiO{9^) + 0(/x2 + e')[V{9^) + 1] 

i=l 

< -XfiVia,,) + 0(/i2 + e'')[Vi9n) + 1] 

< -\fiW{9r,,n) + 0{fi^ + e^)[w(9n,n) + 1]. 



(31) 
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Choose /i and e small enough so that there is a Aq > satisfying Aq < A and 

-A/i + 0(/i^) + 0{e^) < -Ao/i. 

Then we obtain 

E,W{9n+i, n + l)<{l- \o^l)W{en, n) + 0{^^ + e^). 

Note that there is an A'e > such that (1 — Xq^)"^ < 0{fi) for n > N,.. Taking expectation 
in the iteration for W{6n,n) and iterating on the resulting inequality yield 

EWidn+i, n + 1) < (1 - Ao^)"iy (^0, 0) + O (/i + eV/") • 

Thus 

EW{e^+i, n + 1) < 0(/i + eyfi). 
Finally, applying (18)-(21) again, we also obtain 

EV{en+i)<0{fi + e + eyfi). 

Thus the desired result follows. □ 

4 Convergence Properties 
4.1 Switching ODE Limit: ^ = 0(e) 

We assume the adaptation rate and the transition frequency are of the same order, that is 
fi = 0{e). For simplicity, we take fi = e. To study the asymptotic properties of the sequence 
{On}, we take a continuous-time interpolation of the process. Define 

9'^{t) = 6n, a^{t) = an, for t G [n^,n^ + /i). 

We proceed to prove that 6'^(-) converges weakly to a system of randomly switching ordinary 
differential equations. 

Theorem 4.1 Assume (Al)-(A4) hold and e = fi. Then the process (6"^(-), a'^(-)) converges 
weakly to {${■), a{-)) such that a(-) is a continuous-time Markov chain generated by Q and 
the limit process 6{-) satisfies the Markov switched ordinary differential equation 

e{t) = {a{t) - e{t)), 6(0) = Oo. (32) 
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The theorem is estabhshed through a series of lemmas. We begin by using a truncation 
device to bound the estimates. Define = G M*" : |^| < N} to be the ball with radius 
A^, and q'^{-) as a truncation function that is equal to 1 for G S'at, for 9 G Sjy+i, and 
sufficiently smooth between. Then we modify algorithm (2) so that 

Ci := C + ^^^nsgnivn - ^Mq'^ie^), n = 0, 1, . . . , (33) 

is now a bounded sequence of estimates. As before, define 

e^'^'{t) ■=e^ for te [/in,/in + /i). 

We shall first show that the sequence {6'^'^(-), a'^(-)} is tight, and thus by Prohorov's 
theorem we may extract a convergent subsequence. We will then show the limit satisfies a 
switched differential equation. Lastly, we let the truncation bound grow and show the 
untruncated sequence given by (2) is also weakly convergent. 

Lemma 4.2 The sequence (6'^'^(-), is tight in £)([0, cx)) : W x M). 

Proof of Lemma 4.2. Note that the sequence a^{-) is tight by virtue of [33, Theorem 
4.3]. In addition, a^{-) converges weakly to a Markov chain generated by Q. To proceed, we 
examine the asymptotics of the sequence 9^'^{-). We have that for any 5 > 0, and t, s > 
satisfying s < 5, 

2 



E'^\e^^>'{t + s)-e^^^'{t)\ <E, 



/i E ^^sgn(y,-y,^^f)g^(^f) 



k=t/ ^ 
{t+s)/ti.-l {t+s)/ti.-l 

<f^"Ej^ E E V^^'^sgnly, - v.;.^f )sgn(|/, - ^'A^)g^(^f )g^(^f ) (34) 

j=t/ii k=t/ti 

</i' E E En^,\'En^kf<o{s')<o{6'). 

j=t/ti k=t/ij. 

For any T < oo and any < t < T, use to denote the conditional expectation w.r.t. the 
(T- algebra J-"/* , we have 

limlimsupj sup E[E^ lO^'^it + s) - e^'^'{t)\^]} = 0. 

<5^0 0<s<5 ' 

Applying the criterion [15, p. 47], the tightness is proved. □ 
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Since {6^''^{-), «^(-)) is tight, it is sequentially compact. By virtue of Prohorov's theorem, 
we can extract a weakly convergence subsequence. Select such a subsequence and still denote 
it by (6'^''^(-), for notational simplicity. Denote the limit by (^^(■), «(■)). We proceed 

to characterize the limit process. 

Lemma 4.3 The sequence (6'^'^(-), a'^(-)) converges weakly to (6'^(-), «(■)) that is a solution 
of the martingale problem with operator 

Lf /(e^, a,) := Vne^, a,)A« [a, - + a,), (35) 

i=i 

where for each i G Ai, f{-,i) G Cq {C^ functions with compact support). 

Proof. To derive the martingale limit, we need only show that for the function with 
compact support f{-,i), for each bounded and continuous function h{-), each t,s > 0, each 
positive integer k, and each tj < t for i < k, 



Eh{e'\t,),ait^) : z < k) /(^ (t + s), a(t + s)) - /(^ 

ft + S 

N c/qN I 



, (36) 
Lf/(r(r),a(r))cirJ =0. 

To verify (36), we use the processes indexed by /i. As before, note that 

e^^^^{t + s)-e''^^{t)= /^^fcSgn(^',K-^fc]+efc)g^(0. (37) 

k=t/ ^ 

Subdivide the interval with the end points t/ ^ and (t + s)//i — 1 by choosing such that 
— oo as yU — but 5^ = /im^ — 0. By the smoothness of /(■,«), it is readily seen that 
as /i — 0, 

Eh{e''^^{ti), a^^{t,) : z < k) [/(^^•^(t + s), a^{t + s)) - f{9''''^{t), a^(t))] 

(38) 

^ Eh{e''{t,), a{t,) ■.t<K) [/(^^(t + s), a{t + s)) - /(^^(t), a(t))] . 
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Next, we insert a term to examine the change in the parameter a and the estimate 
separately 

hm EhiQ^^^it,), a^iU) : t < k) IfXe^'^U + s), a^it + s)) - f(e^'^(t), 



t+s 



(39) 



t+s 



18 ^=t 

First, we work with the last term in (39). By using a Taylor expansion on each interval 
indexed by / we have 

lim £:;i(^^'^(t,), a^{U) ■.^<K)[Y, [/(C+™,, - /(C,, 



t+s — 1 

hm i?/i(^^'^(t,), af^iU) : z < «:) V k— V V/' 



I5u=t 



k=lmn 



(40) 



xy,,sgn(v,U«fc-^,^) + e,)g^(^r 

im^+/x— 1 



where 9^~^ is a point on the line segment joining 9^^^ and Oj^^^^^. Since 

and Vf'{-,i) is smooth, we have the last term in (40) is o(l) in the sense of in probability 
as /i — !■ 0. To work with the first term we insert the conditional expectation and apply 
(6) to obtain 

t+s im^+m^-l 



t,-po -| I 

hm Eh{9^^>^{U), a^{t,) : z < «:) V "^m— E ^/ 



X Efc[</?;,sgn((^;,(afc - ) + ek)]q 



t, I tj " -1 - ■ —/J, • fj. — 

limi?/i(^^''^(t,),a'^(t,) : z < «:) V E^a'— E ^/'( 

lOf^=t j=l ^ k=lm^ 

- 0^) + o{\a, - 1)1 g^(^f )J|«,=.^|. 



(41) 
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Then for small /i, 

lin^+m^ — l 



(42) 



Letting /i/m^ — r, then by (7), 



t+s mo J- 
lufj,=t j=l ^ k=lm^ 



i+s mo r- im^+m,j-l 
On 



k=lmu 



t+s mo 

is^,=t j=l 



k=lmn 



t+s 



Ehie''{U),aiU) -.iKk) Vf (^^(r),«(r))A("M)[a(r) - ^^(r)]g^(r)rfr. 



(43) 



Likewise, we can obtain 



t+s 



N 

Irrin+m^ i 



(44) 



Eh{e'\U),a{ti) ■.i<K) 



t+s 



Qf\e'\r),a{r))dr 



Combining (40)-(44) with (39), we have established (36) as desired, completing the proof of 
the lemma. □ 

Completion of the Proof of Theorem 4.1. From Lemma 4.3, we have the truncated 
sequence 9^ {■) satisfies the switched ODE 9^ [t) = A("(*))[a(t) - 9^{t)q^{t)], 9{0) = 9^. 
Next, letting — !■ oo, we show that the limit of the untruncated sequence 9{-) and the limit 
of 6'^(-) as — > oo are the same. The argument is similar to that of [17, pp. 249-250]; we 
explain the main steps below. Let P°(-) and P^(-) be the measures induced by 9{-) and 
9^{-), respectively. Since the martingale problem with operator has a unique solution. 
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the associated differential equation has a unique solution for each initial condition and 
is unique. For each T < oo and t < T, P^{-) agrees with -P^(-) on all Borel subsets of the 
set of paths in D[0, oo) with values in Sjy. By using P°(sup«2- \0{t)\ < A^) — )• 1 as — t- oo, 
and the weak convergence of ^^''^(■) to 0^{-), we have 0'^{-) converges weakly to 6{-). Thus 
the proof of Theorem 4.1 is completed. □ 

Remark 4.4 The following calculation will be used for both the slow and fast Markov 
chain cases. The result is essentially one about two-time-scale Markov chains considered 
in [33]. Define a probability vector by = (P(a„ = ai),...,P(a„ = 0^0)) £ M"*^^™". 
Note that Pq = {po,i, ■ ■ ■ ,Po,mo) (independent of e). Because the Markov chain is time 
homogeneous, (P^)" is the n-step transition probability matrix with P^ = I + eQ. Then, for 
some < Ai < 1, 

p^=p(5n) + 0(£ + Ar"), 0<n<O{l/e), (45) 

where p{t) = {pi(t) , . . . , Pmo{t)) is the probability vector of the continuous Markov chain 
with generator Q such that for alH > 

^ = p{t)Q, p{0)=Po, (46) 

and po is the initial probabihty. In addition, 

^ps^n-no ^ 2^^^^^ ^ Q(^^ ^ A"^"-"-^)), (47) 

where with to = euq and t = en, E(tQ,t) satisfies 

(48) 

E{to,to) = L 

Define the continuous-time interpolation a'^{t) of an as 

a^{t):=an for te[ne,ne + e). (49) 

Then a;^(-) converges weakly to a(-), which is a continuous-time Markov chain generated by 
Q with state space Ai. The can be approximated by 

Fan = a^{en) + 0{e + X^""), for n<0{l/e), 

mo 

a^{en) := ajpj (en) . 
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4.2 Slowly- Varying Markov Chain: £ <C /i 

In this case, since the Markov chain changes so slowly, the time-varying parameter process 
is essentially a constant. To facilitate the discussion and to fix notation, we take e = /i^"*"^ 
for some A > in what follows. 

The analysis is similar to the e = 0{fi) case. Begin by defining the continuous time 
interpolation as before. While a truncation device is still needed, we omit it and assume the 
iterates are bounded for notational brevity. The tightness of {6'^{-)} can be verified similar 
to Lemma 4.2. To characterize the weak limit we note that the estimates from the previous 
section remain valid, except that involving the Markov chain a^. Thus we need only examine 
(from the second to last line of (43)) 

t+s mo Zmp+m^-1 

lSi_i=t j=l k=lmfj_ ^ 

t+s mo /m^+m^-1 

= limi?/i(^^''^(t,),«^(t.) ■.^<^)Y. E 

lOfi=t j=l ^ k=lmn 

t+s mo 

''^ is^=t j=l 

/m.p+m,, — 1 mo mo 

10 ) 



X — ^ ^ ^A^^'^ajPiak = aj\aim^ = aijP{aim^ = ajjao = ai^^)P{ao = a 

^ k=lm.fi ii=l io=l 

mo i-t+s 

Ehie'^iU), aiU) ■.i<K)y^ Vf{0{T), a(r))A(*°)a,„P(ao = a,,)dT. 

Jt 



(50) 

To obtain the last line above, we have used that for /m^ < k < Irrifj, + rrifj, since e = /i^'*'^, 
elm^ + < /i'^(t + s) + 5^ — > as /i -> 0, we have by Remark 4.4, 

(pe)fc-im„ _ 2^^^^ ^i^^^ +0(^e + A^^''"'"'"^) 
— / as /i — )■ 0, 

— )■ / as yU — )■ 0. 
We omit the details, but present the main result as follows. 

Theorem 4.5 Assume (Al)-(A4) hold, and e = ji^^^ for some A > 0. Then we have 0^{-) 
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converges weakly to 6{-) such that 6{-) is the unique solution of the differential equation 

-e{t) = ^ A«(a, - e{t))P{ao = a.), ^(0) = 6^. (51) 

i=l 

4.3 Fast- Varying Markov Chain: /i <C e 

The idea for the fast varying chain is that the parameter changes so fast that it quickly 
approaches the stationary distribution of the Markov chain. As a result, the limit dynamic 
system is one that is averaged out with respect to the stationary distribution of the Markov 
chain. In this section, we take e = fi'^ where 1/2 < 7 < 1. Then, letting /i/m^ — > r as in the 
proof of Theorem 4.1, we have e{k — Im^) = ^^{k — Im^) — 00. Thus, for some < Ai < 1, 

E,j{elm^,ek) = + 0{e + A'^'^-''"'')), 

where u = (z/i, . . . , 1/^0) is the stationary distribution of the continuous-time Markov chain 
with generator Q, Sjj(si,S2) denotes the ijth entry of the matrix S(si,S2). Therefore, we 
can show that as — 0, 

t+s mo im,i+m,;j-l 



lSfj,=t j=l ' K=im^ 

™0 pt+S 



^ Eh{e^{ti), a(t,) ■.i<K)Y / V/'(^(r), a{T))A^^^ajiyjdT. 

j=i Jt 

(52) 



Theorem 4.6 Assume (A1)-(A4) hold, and e = fi'^ for some 1/2 < 7 < 1. Then we have 
0^^{-) converges weakly to 6'(-) such that 6{-) is the unique solution of the differential equation 

, mo 

j^9{t) = 5^ A(^)(z/,a, - 9{t)), 9{0) = 9o. (53) 

5 Rates of Convergence 
5.1 Scaled Errors: e = fi 

Define Un := 6'n/v^ = («„ - 9n)/^/Jl. Then 

Un+l = Un- y/JI(PnSgn{(p'Jn + Cn) + ^""^^ (54) 
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In view of Theorem 3.1 there is a A'^ such that E\an — = 0{fi) for n > N^, with 
which we can show {m„ : n > N^j^} is tight. In addition, take A^^ large such that by (19), we 
have 

oo 

^ En{aj+i -Oij) = 0(h) (55) 



j=n 



Then define 



u^{t) := Un for te[{n- N^)^i, {n - N^)^ + /i). 

We can then proceed to the study of the asymptotic distribution of m'^(-). As before, a 
truncation device may be employed. For notational simplicity, it will be assumed here. 

Lemma 5.1 The sequence {m^(-)} is tight in D([0, oo);W) . 



Proof. Note that 



u' 



(t+s)/fM-l 

\t + s)-u>'{t) = -^ J2 9k 

k=t/ fi 



(56) 



Note that we have used the convention that tj [i denotes the integer part of t//i in the above. 
Use E^ to denote the conditional expectation with respect to the a-algebra Ti = cr{u'^(r) : 
r<t}. Then by (55), 

2 



E^\u''{t + s)-u''{t)f < KE^ 



k=t/fi 



(57) 



Now we examine 



E^ 



(t+s)/M-l 
k=t/ n 



mo (t+s)/At-l {t+s)/At-l 

i=l k=t/fi j=t/fj, 

{t+s)/fi~l {t+s)/fi~l 

< E E E l^kOk + oie^riAfe, + o(^,)]/{.,=.a/{.,.=.j 

k=t/fi j=t/^l 

2 5^ 

{t+s)/ti~l 

<^Erj^/i J2 (A«-A«)^fc+o('^fcj ^w=a.} 

i=l k=t/fi 

it+s)/^l~-l ^ ^ 

j=l k=t/fi 



mo 

E 

i=l 
mo 



mo 
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Since -E'l^fcP = 0{fi) for k large (/i small), in the last term of (58) we have 

2 



mo 



E J2 E^Kfi 



i=l 



k=t/fj, 



mo {t+s)/ti-l 



I{a,=a^}<KfiJ2E E \^>' ^W=a.} < 0(/i)s. (59) 
i=l k=t/fj. 



For the first term we use the mixing inequality of (A4), 

2 

I{a,,=ai} 



mo 



E ^ E^Kfi 



(t+s)/^t-l 

1=1 k=t/fi 

mo {t+s)/fi~l {t+s)/fj.-l 



(t+s)/M-l 



9u 



k=t/ jx 

mo _ {t+s)/^l-l 



i=l k=t/fi j>k 

< 0{iJ,)s 



(60) 



For any T < oo and any < t < T, 

limlimsupj sup E[E['|u^(s + t) - u''(t)|2]| = 0, 

SO {u^^{-)} is tight. □ 
Note that gk{ai,i) = (pkSgn{ip'i^[ai — Oj] + e^) = ipkSgn{ek). The following is a variant of 
the well-known central limit theorem for mixing processes; see [3] or [10] for details. 



Lemma 5.2 Define zuk := v'fcSgn(efc). Then 
{t/^l)-l 

Wj converges weakly to a Brownian motion w(t) 

j=0 

with covariance St such that the covariance S is given by 

oo oo 

S = EwqWq + ^ ^ Ewjw'q + ^ ^ EwQw'j. 

3=1 J=l 



(61) 



(62) 



Theorem 5.3 u^{-) converges weakly to u{-) such that u{-) is the solution of 

du{t) = -A^^^dt - f}l^dw, (63) 
where w{-) is a standard Brownian motion. 



18 



Proof. As usual, extract a convergent subsequence of m^(-) (still denoted by u^{-)) with 
limit m(-). We will show that for each s,t > 0, the limit process satisfies 



u{t + s) -u{t) 



t+s 



T 



t+s 



T}'^dw{T) 



Note from (56), 



(t+s)/^l-l 

u^'it + s)- u>'{t) = Yl 9k + 0(v7^) 

k=t/fi 
mo (t+s)//i-l 

= J2^-VJ^ Yl 9k]I{a,=a,} + 0{^). 
k=t/ii 



i=l 



(64) 



(65) 



Define 

9kii) ■■= gkl{ak=a,}, gkii) ■■= EkQkii), and 
Afc(z) := [gkii) - gk{ai,i) - {gk{i) -gk{ai,i))]. 

We then expand on the (negative of the) inside of the sum indexed by i in (65) as 

{t+s)/ii-l 

Y 9k{i) 

k=t/fi 

{t+s)/n~i (t+s)/fi~i (t+s)/^l-l 

= Y \^9kiai,i)+ Y y/J^[9k{i) -gk{ai,i)]+ ^ y/Ji^ki.i) (66) 

k=t/n k=t/n k=t/ii 

{t+s)//i-l {t+s)/ij.-l {t+s)/n-l 

= Yl V^^fc+ Yl f^[A'k^Uk + 0{\Uk\)]+ Y VJ^^kii)- 
k=t/fi k=t/fi k=t/ii 

Note that for the second term above we used gk{cLi,i) = o{9k) = o{y/JI\uk\) by (A3). First, 
we show the last term in (66) is o(l). Since Afc(i) is a martingale difference, we have 

{t+s)/^l-l ^ 

E 



{t+s)/^t-l 

Y v^^fc(^) 

k=t/fj, 



{t+s)/t^-l 
k=t/ fi 

Y l^^[9k{i) - gk{ai, i)]'[gk{i) - gk{ai, i)] 

k=t/ fi 

{t+s)/fi-l 

+ Y I^E\gk{i)-gk{,a^,^)]'\gk{i)-gk{ai,^)]. 

k=t/fj, 



(67) 
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The boundedness of ipk and Uk implies yJJiif>'y.Uk — )■ in probability uniformly in as /i -H- 0. 
Hence, the first term in (67) has 

{t+S)liL-\ 

(68) 

= ^ ^E^p'^(fk[sgr^{^/JI'f'kUk + ek)-sgn{ek)]^->0 as 0. 

k=t/ 11 

Using (A3) and (A4), along with the boundedness of m^, on the second term of (67) gives 
^ IJ,E[gk{i) - gk{ai, i)]'[gk{i) - 9k{ai, i)] 

k=t/fi 

(t+s)/lM-l 

= Yl f^E[^A^^Uk + o{^\uk\)]'[^A^^Uk + o{^\uk\)] 

k=t/ 11 

{t+s)/fi-l 2 

= y^^E\{Af -A^^)uk + A^^Uk + o{\uk\) 

k=t/ 11 

oo {t+s)/^i-l 

</i2/s: ^ 0(A;-t//i) Y K^Oasfi^O. 

k=t/fi k=t/fi 

Hence 

{t+s)/ii-l 

—7-0 as — > 0. 

k=t/fi 

Next, in the second term of (66) we have 

(t+s)/fi-l 

Y /i[4W + o(|n,|)] 

k=t/ fi 

(i+s)//i-l {t+s)/fi~l {t+s)/^i~l 

k=t/fi k=t/fi k=t/fi 



(69) 



(70) 



Similar to the previous section, choose a sequence such that — )■ oo as — t- but 
^f^/ VJ^ = ^/J^^^l 0. Then 



{t+s)/fi-l t+s t+s ^ lm^+m.fj,-l 



Y f^uk= Y ^i^^i^^ + X] — X] ["fc ~ ^^rnj- (71) 



k=t/^ Wp=t lon=t ^ k=lriL^ 
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(72) 



Since for /m^ < k < Im^ + m^, Uk — uim^ = 0{6^/y/JI), so the second term above goes to 
in probability, uniformly in t. Similarly, by (A3), 

{t+s)/fj.-l t+s 

k=t/^ I5fi=t ^ k=lm^ 

Likewise, Ylk=t/i^~^ l^o{\uk\) in probability uniformly in t. 
Hence, putting the above estimates together we obtain 

u(t + s)- uit) = lim u^'it + s)- u'^ii) 

mo t+s {t+s)/fM~l 

i=1 k=t/^i 



i=l lSfi,=t 
rt+s rt+s 



(73) 



A"MM(r)dr- / T}/'^dw{T). 



□ 



5.2 Scaled Errors: e <C /i 

The analysis for the cases e <^ /i and e ^ /i is similar to that for e = 0{fi). We omit 
the details and present the main results. Recall that in the e <^ fi case, the parameter is 
essentially a constant and thus we look to the initial distribution to determine the asymptotic 
properties. 
Define 

mo (y _ Q 

a* := ^a,t-P(ao = Oj), Vn ■ = 



v''{t):=Vn for t e[{n- N^)i2,{n- N^)iJ + n) 

mo 

1=1 

Then we have the following: 

Theorem 5.4 Assume e = fj}'^^ for some A > 0. Then v^{-) converges weakly to v{-) such 
that w(-) is the solution of 

dv{t) = -A^*^vdt - T}/^dw, (74) 
where w{-) is a standard Brownian motion. 
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5.3 Scaled Errors: e n 

Again, here the idea is that the parameter varies so quickly that it quickly converges to 
the stationary distribution u = (z/i, . . . , z/mo)- Thus we look to the expectation against the 
stationary distribution to determine the asymptotic properties. 
Define 

a — Ur, 



1=1 

z^'it) := Zn for t G [(n - iV^)/i, [n - iV^)/i + /i) 

mo 



i=l 



We have the following result. 

Theorem 5.5 Assume e = fi'^ for some 1/2 < 7 < 1. Then z'^{-) converges weakly to z{-) 
such that z{-) is the solution of 

dz{t) = -Azdt - E^/^dw, (75) 
where w{-) is a standard Brownian motion. 

6 Numerical Examples 

Here we demonstrate the performance of the Sign-Error (SE) algorithms and compare it 
with the Sign-Regressor (SR) and Least Mean Squares (LMS) algorithms (see [28, 32] re- 
spectively). We fix the step size /i = .05 and consider three cases: e = (3/5)/i ( e = 0{fi)); 
e = /i^ (a slowly- varying Markov chain); and e = ^Jfi (a fast Markov chain). 

We use state space A^ = { — 1,0, 1} with transition matrix P'^ = I + eQ, where 



Q 



-0.6 0.4 0.2 
0.2 -0.5 0.3 
0.4 0.1 -0.5 



is the generator of a continuous-time Markov chain whose stationary distribution is therefore 
u = (1/3, 1/3, 1/3). Hence a = Yl^=i ^i^i — ^^^^ initial distribution for ao to be 
(3/4, 1/8, 1/8). So a* = J^Li aiPi^o = cii) = -0.625. and {e„} are i.i.d. Af{0, 1) and 

A/'(0, .25), respectively. We proceed to observe 1000 iterations of the algorithm for the cases 
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True Parameter 

— — - SE Estimate 
— - SR Estimate 
LMS Estimate 



'V 11/ 



100 200 300 400 500 600 700 800 900 1000 

Iteration 

Figure 1: Markov chain parameter process and estimates obtained by adaptive filtering with 
e = 0(/i) 
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^ If 



Mi 




Figure 2: Markov chain parameter process and Figure 3: Markov chain parameter process and 
estimates obtained by adaptive filtering with estimates obtained by adaptive filtering with 



e = 0{fi) and e » fi, and 10, 000 iterations for the case e << fi (in order to illustrate some 
variations of the parameter). 

To observe the tracking behavior of the SE algorithm, in comparison to the SR and LMS 
algorithms, we overlay the respective plots for each case. When e = 0{fi), the LMS and 
SR estimates tend to be approximately equal, while the SE estimates show more deviations 
from the other estimates. The SE algorithm responds to changes in the parameter more 
quickly, while the LMS and SR algorithms adhere to the parameter more closely while it is 
stationary. In the e <^ /i case, we see this behavior repeated. While all three estimates track 
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Figure 4: Scaled error Zn with fast varying Figure 5: Average of parameter process and 
Markov chain (e 3> /i): Diffusion behavior estimates over time with e S> /x 



the parameter closely, the LMS and SR estimates deviate from the parameter less than the 
SE estimates between jumps of the parameter. 

In the 6 ^ fi case, none of the algorithms can track the parameter at each iterate very 
well. However, when we observe the scaled error against the stationary distribution of the 
Markov chain Zn, the expected diffusion behavior is displayed. Examining the cumulative 
average of the parameter and the estimates of the iterates, we note that the parameter 
average quickly converges to a. The LMS and SR estimate averages adhere closely to the 
parameter average, while the SE estimate average deviates slightly more. 
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